ConCreTE Logo

Context and Datasets

The CONTEXT + DATASET Library is a curated collection of datasets designed specifically for teaching responsible data science. Each dataset is paired with rich contextual information—such as background, stakeholder perspectives, considerations, and intended use—to encourage critical thinking and reflection in data-driven decision-making.

Key Purpose: To enable faculty to replace outdated or decontextualized training datasets with materials that support intentional, values-driven instruction.

Why It Matters: Responsible data science requires more than technical accuracy—it demands an understanding of why the data exists, who it impacts, and how it should be used. This library provides that missing layer of context to help students meaningfully engage with real-world implications.

Definition of Responsible Data Science: The intentional alignment of data-based tools and processes (planning, development, application, and improvement) with the values of communities and organizations—empowering positive decisions and mitigating harm.

Intended Use:

  • Faculty can integrate datasets directly into coursework, case studies, or project-based assignments.
  • Supports learning outcomes related to ethics, stakeholder engagement, equity, and critical analysis.
  • Structured around connecting context (introductory narrative, role, and business objectives, and products) with datasets (data dictionary, original dataset, provenance, derived datasets)
  • Potential responsible data science opportunities are provided for faculty to leverage is helpful but also to allow for faculty the flexibility to use the dataset in a way that fits through course learning objectives

C+DS items are built out into RDSO Scenarios that are broken down into Data Science Tasks with buildable skills.

Explore the Contexts and Datasets

Filter
  • Rising rates of diet-related health issues have led to growing public scrutiny of the fast food industry, particularly the nutritional content of menu items. EatWell Insights, a health-focused food analytics company, partners with public health organizations to evaluate and improve food choices in the fast food sector. In this scenario, learners will work with nutrition data from MenuStat, a publicly available database that tracks menu items and their nutrition information across major fast food chains in the U.S.

  • Declining civic participation, particularly in voting and volunteering, has raised concerns about democratic engagement and social trust in the U.S. Civic Pulse Analytics, a nonprofit research firm, partners with local governments and advocacy groups to analyze civic behaviors and develop targeted engagement strategies.

  • As artificial intelligence becomes more powerful and embedded across industries, policymakers and technologists face growing pressure to establish transparent, accountable, and globally coordinated AI governance frameworks.

  • The U.S. Conservation Reserve Program (CRP), intended to support sustainable farming and soil conservation, is struggling to meet its goals amid declining farmer participation and outdated targeting strategies. AgriPulse Analytics, a nonprofit research group focused on sustainable agriculture policy, provides data-driven insights to inform federal conservation program improvements. Learners will work with USDA Agricultural Census data accessed via an ArcGIS Feature Service, analyzing land use patterns and conservation enrollment across U.S. counties.

  • AI is playing an increasingly central role in disease prediction, helping healthcare providers shift from reactive to preventative care—but concerns remain about accuracy, fairness, and clinical integration. The National Human Genome Research Institute (NHGRI) is exploring how AI models trained on structured health data can assist clinicians in identifying individuals at risk for chronic conditions like heart disease.

  • Pittsburgh’s economy has shifted toward technology and healthcare, but manufacturing remains a critical sector, employing thousands and contributing significantly to regional GDP.

  • The consumer electronics industry faces ongoing competition as brands balance innovation, pricing, and promotional strategies to capture market share

  • The field of public health is grappling with increasing autism spectrum disorder (ASD) prevalence and evolving demographic patterns, underscoring the need for equitable early diagnosis.

  • • Gender-affirming care insurance coverage in the U.S. faces potential rollbacks under new federal health policy shifts. • The policy changes are unfolding amid broader healthcare and LGBTQIA+ equity debates led by a federal agency.

  • • Canada's automotive manufacturing sector is poised for transformation amid growing pressure to expand electric vehicle (EV) production and sales. • The scenario centers on a Canadian automotive manufacturer grappling with evolving industry dynamics, regulatory EV mandates, and shifting consumer demand.

  • The banking industry faces a growing threat from AI-enhanced fraud schemes that bypass traditional detection systems. TrustNet Bank, a mid-sized financial institution, serves both personal and small business custom

  • • The retail industry is undergoing a rapid shift as companies leverage AI and data analytics to optimize store layouts, product placement, and customer experiences. • Lowe’s, a major U.S. home improvement retailer, is using AI, spatial intelligence, and digital twin simulations to redesign store layouts and align inventory with shifting customer trends.

  • In a competitive, data-rich retail landscape, modern analytics empower companies to make rapid, evidence-based decisions for inventory, marketing, and customer engagement.

  • The pharmaceutical industry faces mounting concerns over the safety and oversight of medications produced through global supply chains, especially with growing reliance on Chinese manufacturing. https://en.wikipedia.org/wiki/Pharmaceutical_industry_in_China?utm_source=chatgpt.comhttps://prosperousamerica.org/china-owns-americas-generic-drug-supply-chain-u-s-china-commission-says/?utm_source=chatgpt.comhttps://www.uscc.gov/sites/default/files/2019-11/Chapter%203%20Section%203%20-%20Growing%20U.S.%20Reliance%20on%20China%E2%80%99s%20Biotech%20and%20Pharmaceutical%20Products.pdf?utm_source=chatgpt.com

  • The consumer electronics industry—particularly wearables—faces growing scrutiny as device defects can lead to physical injuries and regulatory penalties.

  • The retail industry faces increasing pressure to optimize supply chains amid shifting consumer spending and growing demand for faster delivery. Target Corporation, a leading U.S. retailer, is investing heavily in expanding regional distribution centers to improve grocery supply chain efficiency and better serve customers.

  • Parkinson’s disease management faces challenges due to the need for frequent clinical assessments, especially for patients with mobility issues or living remotely. Leeds Teaching Hospitals has introduced a smartphone app to facilitate remote symptom tracking and improve patient care. The learner will work with a telemonitoring dataset containing speech recordings and clinical scores from Parkinson’s patients to explore remote symptom monitoring.

  • Esophageal cancer treatment outcomes remain a major challenge in oncology, with perioperative chemotherapy showing promise in improving survival rates.

  • The rapid growth of artificial intelligence has spurred a surge in global venture funding, reshaping the technology and innovation landscape.

  • The rise of sophisticated deepfake technology has increased risks of identity fraud in the financial services industry. A leading financial institution is working to strengthen its fraud detection capabilities to safeguard customer assets and maintain trust.

  • Urban wildlife management faces challenges balancing ecological health with human interaction in public parks. The Central Park Conservancy oversees the care and study of New York City's Central Park, including its diverse animal populations.

  • Motor vehicle fatalities have sharply increased in New York, raising urgent concerns about traffic safety and public health in the transportation industry.

  • Urban forestry plays a critical role in improving city livability and combating climate change within the environmental management industry.

  • Food insecurity remains a pressing social issue within urban public health and community development sectors. Philadelphia-based nonprofits and government agencies are collaborating to address youth hunger and improve access to nutritious foods across the city.

  • The banking industry faces rapid transformation driven by economic shifts, regulatory changes, and technological innovation

  • Childcare accessibility and affordability remain critical issues within the early childhood education and social services sectors. The Pittsburgh Foundation plays a vital role in supporting community initiatives to improve childcare resources and outcomes in the region.

  • This scenario addresses trends in higher education enrollment within the community college industry. The source organization collects and analyzes enrollment data to track shifts and inform institutional planning

  • This scenario focuses on crime data analysis within the public safety and law enforcement industry. The FBI, through its Uniform Crime Reporting (UCR) program, collects nationwide crime statistics to support crime monitoring and policy decisions

  • This scenario addresses youth health and risk behaviors within the public health and education sectors. The CDC administers the Youth Risk Behavior Survey to monitor behaviors contributing to the leading causes of morbidity and mortality among adolescents.

  • This scenario explores disparities in pediatric health outcomes, focusing on inequities faced by children of color within the healthcare and public health sectors. The

  • Public health data reporting varies widely across states, impacting how disease trends are tracked and managed within the healthcare and government data management industry.

  • The recent outbreak of Sudan Ebola virus disease in Uganda highlights critical challenges in infectious disease control within the global public health sector.

  • This scenario addresses the challenges of disaster response and recovery within the humanitarian and geospatial data industries. The Harvard Humanitarian Initiative plays a crucial role by collecting, analyzing, and disseminating geospatial data to support relief efforts in disaster-affected regions.

  • The scenario focuses on the challenge of managing dilapidated and abandoned properties, which pose risks to community safety and strain city resources within the urban housing and municipal services industry

  • In today’s data-driven world, the ability to analyze and interpret data is an essential skill across industries—from healthcare to finance to public policy. However, many high school and community college students face barriers to engaging with data science, including limited access to mentors, uncertainty in developing research questions, and challenges in working with datasets. DataJam was created to bridge this gap, fostering data literacy and critical thinking through real-world problem-solving. To enhance this experience, the Virtual DataJam Helper introduces AI-powered mentors, such as ChatGPT, to guide students through the research process. This tool supports learners in refining their hypotheses, identifying relevant datasets, and navigating data exploration. By embedding responsible data science principles, this initiative empowers students to ask meaningful questions, work with functional data, and develop confidence in using AI tools—preparing them for further education and careers in an increasingly data-centric world.

  • https://www.truthinaccounting.org/news/detail/falling-revenue-soaring-costs-threaten-pittsburghs-financial-future?utm_source=chatgpt.com

  • Affordable housing remains a critical issue in Pittsburgh, with initiatives focused on creating funding solutions and tracking housing stability.

  • The banking industry is undergoing a significant transformation driven by large investments in artificial intelligence and advanced technologies aimed at improving customer experience and operational efficiency.

  • Vacant and tax-delinquent properties pose challenges for Pittsburgh neighborhoods, complicated by outdated ownership records.

  • In the mid-1990s, the Group Insurance Commission (GIC) in Massachusetts released anonymized health records for research purposes, believing that removing explicit identifiers like names would protect individuals' privacy. However, by combining this data with publicly available voter registration information, individuals could be re-identified, exposing sensitive health information.

  • In 2006, AOL published a dataset containing 20 million search queries from 650,000 users, intending it for research use. Although the data was anonymized by replacing usernames with unique identification numbers, reporters from The New York Times were able to identify individuals based on the content of their search queries, revealing personal and sensitive information.

  • In 2006, Netflix released an anonymized dataset of movie ratings from approximately 500,000 subscribers as part of a public machine learning competition. Researchers Arvind Narayanan and Vitaly Shmatikov demonstrated that by cross-referencing this dataset with non-anonymous IMDb user reviews, they could re-identify specific individuals, uncovering their personal viewing histories.

  • During the 2010 Haiti earthquake, researchers utilized anonymized mobile phone data to track population movements. This information was crucial in understanding the spread of cholera and coordinating effective humanitarian responses. By analyzing call patterns, researchers could estimate the movement of people both in response to the earthquake and during the related cholera outbreak, enabling rapid and accurate interventions.

  • During the Ebola outbreak in West Africa, telecommunications companies shared aggregated mobile data to help track and predict the spread of the virus. This collaboration allowed health organizations to allocate resources more effectively and implement targeted containment strategies.

  • The manufacturing and industrial sector is increasingly embracing digital transformation to enhance productivity and innovation. General Electric (GE) is undergoing a company-wide digital overhaul to optimize operations and product development.

  • News.Oper.010 News.Oper.010

    Digital marketing firms increasingly rely on user engagement data to optimize ad performance, but without observability into how data flows and changes across systems, ad targeting can become inefficient or biased.

  • A mysterious “coding issue” in Equifax’s AI-driven credit scoring system led to consumers receiving inaccurate credit scores in early 2022, illustrating risks within the financial services and data infrastructure industry.